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Abstract 


We consider a sequential version of the classical bin packing problem in which items are received 
one by one. Before the size of the next item is revealed, the decision maker needs to decide whether 
the next item is packed in the currently open bin or the bin is closed and a new bin is opened. If 
the new item does not fit, it is lost. If a bin is closed, the remaining free space in the bin accounts 
for a loss. The goal of the decision maker is to minimize the loss accumulated over n periods. We 
present an algorithm that has a cumulative loss not much larger than any strategy in a finite class 
of reference strategies for any sequence of items. Special attention is payed to reference strategies 
that use a fixed threshold at each step to decide whether a new bin is opened. Some positive and 
negative results are presented for this case. 


Keywords: bin packing, on-line learning, prediction with expert advice 


1. Introduction 


In the classical off-line bin packing problem, an algorithm receives items (also called pieces) of size 
X1,X2,---,;%Xn € (0,1]. We have an infinite number of bins, each with capacity 1, and every item is to 
be assigned to a bin. Further, the sum of the sizes of the items (also denoted by x+) assigned to any 
bin cannot exceed its capacity. A bin is empty if no item is assigned to it, otherwise, it is used. The 
goal of the algorithm is to minimize the number of used bins. This is one of the classical NP-hard 
problems and heuristic and approximation algorithms have been investigated thoroughly, see, for 
example, Coffman et al. (1997). 

Another well-studied version of the problem is the so-called on-line bin packing problem. Here 
items arrive one by one and each item x; must be assigned to a bin (with free space at least x+) 
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immediately, without any knowledge of the next pieces. In this setting the goal is the same as in 
the off-line problem, that is, the number of used bins is to be minimized, see, for example, Seiden 
(2002). 

In both the off-line and on-line problems the algorithm has access to the bins in arbitrary or- 
der. In this paper we abandon this assumption and introduce a more restricted version that we call 
sequential bin packing. In this setting items arrive one by one (just like in the on-line problem) 
but in each round the algorithm has only two possible choices: assign the given item to the (only) 
open bin or to the “next” empty bin (in this case this will be the new open bin), and items cannot 
be assigned anymore to closed bins. An algorithm thus determines a sequence of binary decisions 
i1,...,in where i; =O means that the next item is assigned to the open bin and i, = 1 means that a 
new bin is opened and the next item is assigned to that bin. Of course, if i, = 0, then it may happen 
that the item x, does not fit in the open bin. In that case the item is “lost.” If the decision is i, = 1 then 
the remaining empty space in the last closed bin is counted as a loss. The measure of performance 
we use is the total sum of all lost items and wasted empty space. 

Just as in the original bin packing problem, we may distinguish off-line and on-line versions 
of the sequential bin packing problem. In the off-line sequential bin packing problem the entire 
sequence x;,...,X, is known to the algorithm at the outset. Note that unlike in the classical bin 
packing problem, the order of the items is relevant. This problem turns out to be computationally 
significantly easier than its non-sequential counterpart. In Section 3 we present a simple algorithm 
with running time of O(n?) that minimizes the total loss in the off-line sequential bin packing 
problem. 

Much more interesting is the on-line variant of the sequential bin packing problem. Here the 
items x; are revealed one by one, after the corresponding decision i; has been made. In other words, 
each decision has to be made without any knowledge on the size of the item. Formulated this way, 
the problem is reminiscent of an on-line prediction problem, see Cesa-Bianchi and Lugosi (2006). 
However, unlike in standard formulations of on-line prediction, here the loss the predictor suffers 
depends not only on the outcome x; and decision i, but also on the “state” defined by the fullness of 
the open bin. 

Our goal is to extend the usual bin packing problems to situations in which one can handle only 
one bin at a time, and items must be processed immediately so they cannot wait for bin changes. 
To motivate the on-line sequential model, one may imagine a simple revenue management problem 
in which a decision maker has a unit storage capacity at his disposal. A certain product arrives in 
packages of different size and after each arrival, it has to be decided whether the stored packages 
are shipped or not. (Storage of the product is costly.) If the stored goods are shipped, the entire 
storage capacity becomes available again. If they are not shipped one waits for the arrival of the 
next package. However, if the next package is too large to fit in the remaining open space, it is lost 
(it will be stored in another warehouse). 

In another example of application, a sensor collects measurements that can be compressed to 
variable size (these are the items). The sensor communicates its measurements by sending frames 
of some fixed size (bins). Since it has limited memory, it cannot store more data than one frame. 
To save energy, the sensor must maximize its throughput (the proportion of useful data in each 
frame) and at the same time minimize data loss (this trade-off is reflected in the definition of the 
loss function). 

Just like in on-line prediction, we compare the performance of an algorithm with the best in 
a pool of reference algorithms (experts). Given a set of N reference strategies, we construct a 


90 


ON-LINE SEQUENTIAL BIN PACKING 


randomized algorithm for the sequential on-line bin packing problem that achieves a cumulative 
loss (measured as the sum of the total wasted capacity and lost items) that is less than the total loss 
of the best strategy in the class (determined in hindsight) plus a quantity of the order of n? nl N, 

Arguably the most natural comparison class contains all algorithms that use a fixed threshold 
to decide whether a new bin is opened. In other words, reference predictors are parameterized by 
a real number p € (0,1). An expert with parameter p simply decides to open a new bin whenever 
the remaining free space in the open bin is less than p. We call such an expert a constant-threshold 
strategy. First we point out that obtaining uniform regret bounds for this class is difficult. We 
present some impossibility results in relation to this problem. We also offer some data-dependent 
bounds for an algorithm designed to compete with the best of all constant-threshold strategies, and 
show that if item sizes are jittered with a certain noise then a uniform regret bound of the order of 
n2/3 ln! n may be achieved . 

The principal difficulty of the problem lies in the fact that each action of the decision maker takes 
the problem in a new “state” (determined by the remaining empty space in the open bin) which has 
an effect on future losses. Moreover, the state of the algorithm is typically different from the state 
of the experts which makes comparison difficult. In related work, Merhav et al. (2002) considered 
a similar setup in which the loss function has a “memory,” that is, the loss of a predictor depends on 
the loss of past actions. Furthermore, Even-Dar et al. (2005) and Yu et al. (2009) considered the MDP 
case where the adversarial reward function changes according to some fixed stochastic dynamics. 
However, there are several main additional difficulties in the present case. First, unlike in Merhav 
et al. (2002), but similarly to Even-Dar et al. (2005) and Yu et al. (2009), the loss function has an 
unbounded memory as the state may depend on an arbitrarily long sequence of past predictions. 
Second, the state space is infinite (the [0, 1) interval) and the class of experts we compare to is also 
infinite, in contrast to both of the above papers. However, the special properties of the bin packing 
problem make it possible to design a prediction strategy with small regret. 

Note that the MDP setting of Even-Dar et al. (2005) and Yu et al. (2009) would be a too pes- 
simistic approach to our problem, as in our case there is a strong connection between the rewards in 
different states, thus the absolute adversarial reward function results in an overestimated worst case. 
Also, in the present case, state transitions are deterministically given by the outcome, the previous 
state, and the action of the decision maker, while in the setup of Even-Dar et al. (2005) and Yu et al. 
(2009) transitions are stochastic and depend only on the state and the decision of the algorithm, but 
not on the reward (or on the underlying individual sequence generating the reward). 

We also mention here the similar on-line bin packing with rejection problem where the algorithm 
has an opportunity to reject some items and the loss function is the sum of the number of the used 
bins and the “costs” of the rejected items, see He and Dósa (2005).'! However, instead of the number 
of used bins, we use the sum of idle capacities (missed or free spaces) in the used bins to measure 
the loss. 

The following example may help explain the difference between various versions of the prob- 
lem. 


Example 1 Let the sequence of the items be (0.4,0.5,0.2, 0.5,0.5,0.3,0.5,0.1). Then the cumula- 
tive loss of the optimal off-line bin packing is O and it is 0.4 in the case of sequential off-line bin 
packing (see Figure 1). In the sequential case the third item (0.2) has been rejected. 





1. In sequential bin packing we assume that the cost of the items coincides with their size. In this case the optimal 
solution of bin-packing with rejection is trivially to reject all items. 
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a) off-line b) sequential off-line 


Figure 1: The difference between the optimal solutions for the off-line and sequential off-line prob- 
lems. 


The rest of the paper is organized as follows. In Section 2 the problem is defined formally. 
In Section 3 the complexity of the off-line sequential bin packing problem is analyzed. The main 
results of the paper are presented in Sections 4 and 5. 


2. Setup 


We use a terminology borrowed from the theory of on-line prediction with expert advice. Thus, we 
call the sequential decisions of the on-line algorithm predictions and we use forecaster as a synonym 
for algorithm. 

We denote by J, € {0,1} the action of the forecaster at time f (i.e., when t — 1 items have been 
received). Action 0 means that the next item will be assigned to the open bin and action 1 represents 
the fact that a new bin is opened and the next item is assigned to the next empty bin. Note that 
we assume that we start with an open empty bin, thus for any reasonable algorithm, /; = 0, and we 
will restrict our attention to such algorithms. The sequence of decisions up to time ¢ is denoted by 


I, € {0,1}. 
Denote by $; € [0, 1) the free space in the open (last) bin at time ¢ > 1, that is, after having placed 
the items x1,%2,...,x, according to the sequence I, of actions. This is the state of the forecaster. 


More precisely, the state of the forecaster is defined, recursively, as follows: As at the beginning we 
have an empty bin, so = 1. Fort = 1,2,...,n, 


e 5; = 1 — x, when the algorithm assigns the item to the next empty bin (i.e., J, = 1); 


© 5; =5;-1, when the assigned item does not fit in the open bin (i.e., J; = 0 and 5;-1 < x); 


) 


a 


+ = S;-1 — X, when the assigned item fits in the open bin (i.e., J, = 0 and s;_; > x). 


This may be written in a more compact form: 


S; = S (L, Xt, S71) 
— LA =x) + (1-4) (81-1 — gaan) 


where Tr. denotes the indicator function of the event in brackets, that is, it equals 1 if the event is 
true and 0 otherwise. The loss suffered by the forecaster at round t is 


CTs, X | S11); 
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where the loss function £ is defined by 


#(0,x|s) 0, ifs>x; (1) 
X |S) = * 
x, otherwise 


and 
1x |\s) =s. (2) 


The goal of the forecaster is to minimize its cumulative loss defined by 


t 
Li = Lis = X Uls, xs | S1) . 
s=1 


In the off-line version of the problem, the entire sequence x),...,x, is given and the solution is the 
optimal sequence I% of actions 
I; = argmin Ly,» . 

T,€{0,1}" 
In the on-line version of the problem the forecaster does not know the size of the next items, and the 
sequence of items can be completely arbitrary. We allow the forecaster to randomize its decisions, 
that is, at each time instance t, J; is allowed to depend on a random variable U, where U,,...,U,, are 
iid. uniformly distributed random variables in (0, 1]. 

Since we allow the forecaster to randomize, it is important to clarify that the entire sequence 
of items are determined before the forecaster starts making decisions, that is, x1,...,x, € (0, 1] are 
fixed and cannot depend on the randomizing variables. (This is the so-called oblivious adversary 
model known in the theory of sequential prediction, see, for example, Cesa-Bianchi and Lugosi 
2006.) 

The performance of a sequential on-line algorithm is measured by its cumulative loss. It is 
natural to compare it to the cumulative loss of the off-line solution Iy. However, it is easy to see 
that in general it is impossible to achieve an on-line performance that is comparable to the optimal 
solution. (This is in contrast with the non-sequential counterpart of the bin packing problem in 
which there exist on-line algorithms for which the number of used bins is within a constant factor 
of that of the optimal solution, see Seiden 2002.) 

So in order to measure the performance of a sequential on-line algorithm in a meaningful way, 
we adopt an approach extensively used in on-line prediction (the so-called “experts” framework). 
We define a set of reference forecasters, the so-called experts. The performance of the algorithm is 
evaluated relative to this set of experts, and the goal is to perform asymptotically as well as the best 
expert from the reference class. 

Formally, let fe € {0,1} be the decision of an expert E at round t, where E € £ and £ is the 
set of the experts. This set may be finite or infinite, we consider both cases below. Similarly, we 
denote the state of expert E with sz; after the t-th item has been revealed. Then the loss of expert E 
at round ¢ is 


L( fet, Xt | SE4-1) 


and the cumulative loss of expert E is 


Len = Gian: | SEt-1). 


t=1 
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SEQUENTIAL ON-LINE BIN PACKING PROBLEM WITH EXPERT ADVICE 


Parameters: set £ of experts, state space 5 = [0,1), action space 4 = {0,1}, non- 
negative loss function £: (A x (0, 1]|.5) — [0, 1), number n of items. 
Initialization: so = 1 and sg o = 1 for all E € £. 


For each round t = 1,...,n, 
(a) each expert forms its action fp; E€ A; 


(b) the forecaster observes the actions of the experts and forms its own decision 


(c) the next item x, € (0, 1] is revealed; 


(d) the algorithm incurs loss ¢(J;,x; | 5;-1) and each expert E € £ incurs loss 
L( fE t,Xt | SE 1-1). The states of the experts and the algorithm are updated. 











Figure 2: Sequential on-line bin packing problem with expert advice. 


The goal of the algorithm is to perform almost as well as the best expert from the reference class £ 
(determined in hindsight). Ideally, the normalized difference of the cumulative losses (the so-called 
regret) should vanish as n grows, that is, one wishes to achieve 


lim sup Ey, — inf Len) <0 
nooo M ECE 
with probability one, regardless of the sequence of items. This property is called Hannan consis- 
tency, see Hannan (1957). The model of sequential on-line bin packing with expert advice is given 
in Figure 2. 

In Sections 4 and 5 we design sequential on-line bin packing algorithms. In Section 4 we assume 
that the class £ of experts is finite. For this case we establish a uniform regret bound, regardless of 
the class and the sequence of items. In Section 5 we consider the (infinite) class of experts defined 
by constant-threshold strategies. This case turns out to be considerably more difficult. We show 
that algorithms, similar (in some sense) to the one developed for the finite expert classes, cannot 
work in general in this situation. We provide a data-dependent regret bound for a generalization 
of the finite-expert algorithm of Section 4, which, in accordance with the previous result, does not 
guarantee consistency in general. However, we show that if the item sizes are jittered with certain 
noise, the regret of the algorithm vanishes uniformly regardless of the original sequence of items. 

But before turning to the on-line problem, we show how the off-line problem can be solved by 
a simple quadratic-time algorithm. 


3. Sequential Off-line Bin Packing 


As it is well known, most variants of the bin packing problem are NP-hard, including bin packing 
with rejection, see He and Dósa (2005), and maximum resource bin packing, see Boyar et al. (2006). 
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In this section we show that the sequential bin packing problem is significantly easier. Indeed, we 
offer an algorithm to find the optimal sequential strategy with time complexity O(n”) where n is the 
number of the items. 

The key property is that after the t-th item has been received, the 2’ possible sequences of 
decisions cannot lead to more than ¢ different states. 


Lemma 1 For any fixed sequence of items x1 ,x2,...,X, and for every 1 <t <n, 


S| <t, 
where 
S= {5:5 = sy, 7,1; € {0,1}'} 
and sy, is the state reached after receiving items x,,...,X; with the decision sequence \,. 


Proof The proof goes by induction. Note that since J; = 0, we always have sy,,; = 1 — xı, and 
therefore |5;| = 1. Now assume that |5,-1| <<t—1. At time ż, the state of every sequence of 
decisions with J, = 0 belongs to the set $; = {s : s’ = s — I s>x,}Xr;S E S:-1} and the state of those 
with J; = 1 becomes 1 — x;. Therefore, 


ISl < |S|+1 <|S-1]+1<t 


as desired. E 


To describe a computationally efficient algorithm to compute I”, we set up a graph with the set 
of possible states as a vertex set (there are O(n”) of them by Lemma 1) and we show that the shortest 
path on this graph yields the optimal solution of the sequential off-line bin packing problem. 

To formalize the problem, consider a finite directed acyclic graph with a set of vertices V = 
{V1,---,Vjyf and a set of edges E = {e1,...,€\~)}. Each vertex vy = v(sp,tk) of the graph is defined 
by a time index tg and a state sx € S„ and corresponds to state są reachable after tg steps. To show 
the latter dependence, we will write vk € Sy. Two vertices (v;,v;) are connected by an edge if and 
only if v; E€ S—1, vj E S& and state v; is reachable from state v;. That is, by choosing either action 
0 or action 1 in state v;, the new state becomes v; after item x; has been placed. Each edge has a 
label and a weight: the label corresponds to the action (zero or one) and the weight equals the loss, 
depending on the initial state, the action, and the size of the item. Figure 3 shows the proposed 
graph. Moreover a sink vertex vy; is introduced that is connected with all vertices in S,. These 
edges have weight equal to the loss of the final states. These losses only depend on the initial state 
of the edges. More precisely, for (v;,vjy|) the loss is 1 — s;, where v; € Sn. 

Notice that there is a one to one correspondence between paths from vı to vy; and possible 
sequences of actions of length n. Furthermore, the total weight of each path (calculated as the sum 
of the weights on the edges of the path) is equal to the loss of the corresponding sequence of actions. 
Thus, if we find a path with minimal total weight from v; to vjy|, we also find the optimal sequence 
of actions for the off-line bin packing problem. It is well known that this can be done in O(|V|+|E]) 
time.” 

Now by Lemma 1, |V| < n(n + 1)/2+ 1, where the additional vertex accounts for the sink. 
Moreover it is easy to see that |E| <n(n—1)+n =n’. Hence the total time complexity of finding 
the off-line solution is O(n’). 





2. Here we assume the simplified computational model that referring to each vertex (and edge) requires a constant 
number of operations. In a more refined computational model this may be scaled with an extra log |V| factor. 
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A N owls N 0/€0,x2|22) = N 
Se E } =| v4 | 


ay 





Figure 3: The graph corresponding to the off-line sequential bin packing problem. 


4. Sequential On-line Bin Packing 


In this section we study the sequential on-line bin packing problem with expert advice, as described 
in Section 2. We deal with two special cases. First we consider finite classes of experts (i.e., 
reference algorithms) without any assumption on the form or structure of the experts. We construct 
a randomized algorithm that, with large probability, achieves a cumulative loss not larger than that 
of the best expert plus O(n2/3 In! N) where N = |£] is the number of experts. 


The following simple lemma is a key ingredient of the results of this section. It shows that in 
sequential on-line bin packing the cumulative loss is not sensitive to the initial states in the sense 
that the cumulative loss depends on the initial state in a minor way. 


Lemma 2 Let i1,...,im E€ {0,1} be a fixed sequence of decisions and let x1,...,Xm € (0,1] be a 
sequence of items. Let so, 5% € [0, 1) be two different initial states. Finally, let so, ...,Sm and Sp, .. ., Sh 
denote the sequences of states generated by i,,...,im and X\,...,Xm Starting from initial states so 


and sp, respectively. Then 


m 


m 
ue (ir, Xe | S11) — $ linx | 5-1) < so+so <2. 
i=l 


Proof Let m denote the smallest index for which iw = 1. Note that s,_; = s/_, for all t > w. 
Therefore, we have 


m m 


Y Lin x | an => Y Lin x | St-1) 


t=1 t=1 


m 


= lin, Xt | s) 1) Fu it, Xt | St— 1) 
1 


t= t=1 


= 


ml! — 


= 3 LO, x | s1) -T (0,x | s1) HEC, Xm | Su) — LC Xm | Smt—1) - 
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Now using the definition of the loss (see Equations 1 and 2), we write 


m m 
LU ir, Xr | S4- 1) hE (ir Xr | S11) 
t=1 t=1 
m'—1 
= X (I / —I \+s' — Sy! 
tH s <x} {s11 <x} m'—1 m'—1 
t=1 
m'—1 
1 
Š (1 — lrs 1<x}) + Swi Sm- 
t=1 
m — 
1 
< i “(1 = lisica) T So 





Š so + s9 


where the next-to-last inequality holds because s/,,_, < sọ and Sw-—1 > 0, and the last inequality 
follows from the fact that 


O< Smw-1 = Sm-2- Lesy o2x 1} Xm- 
= Sm'-—3 — Irs 32x2} Xm -2 = Ts 2x i }¥m -1 
m’—1 
= S$o— Ł Lis 12x} 4t . 
t=1 


Similarly, 


m 


l(i, Xr | 8:1) — D Cir | 1-1) < so +50 
1 t=1 


Ms 


EN 
Il 


and the statement follows. E 


The following example shows that the upper bound of the lemma is tight. 


Example 2 Let xı = so, Sọ < So, and m' = 2. Then 


LU is, X | 1) Ji it, Xr | 51-1) 


i=l 

= £(0,x1 | so) a | si) = (¢(0, x1 | so) +4(1,x2 | s1)) 
£(0, so | s0) +£(1,x2 | so) — (£(0, so | 80) +£(1,x2 | 0)) 
= so+so— (0+0). 


3 


Now we consider the on-line sequential bin packing problem when the goal of the algorithm is 
to keep its cumulative loss close to the best in a finite set of experts. In other words, we assume 
that the class of experts is finite, say |E| = N, but we do not assume any additional structure of the 
experts. The ideas presented here will be used in Section 5 when we consider the infinite class of 
constant-threshold experts. 

The proposed algorithm partitions the time period t = 1,...,n into segments of length m where 
m < n is a positive integer whose value will be specified later. This way we obtain n’ = |n/m| 


97 


GYÖRGY, LUGOSI AND OTTUCSAK 


segments of length m, and, if m does not divide n, an extra segment of length less than m. At the 
beginning of each segment, the algorithm selects an expert randomly, according to an exponentially 
weighted average distribution. During the entire segment, the algorithm follows the advice of the 
selected expert. By changing actions so rarely, the algorithm achieves a certain synchronization 
with the chosen expert, since the effect of the difference in the initial states is minor, according to 
Lemma 2. (A similar idea was used in Merhav et al. (2002) in a different context.) The algorithm 
is described in Figure 4. Recall that each expert E € £ recommends an action fg; € {0,1} at every 
time instance t = 1,...,n. Since we have N experts, we may identify £ with the set {1,...,N}. Thus, 
experts will be indexed by the positive integers i € {1,...,N}. At the beginning of each segment, the 
algorithm chooses expert i randomly, with probability p;;, where the distribution p; = (P1 t,- --, PN) 
is specified in the algorithm. The random selection is made independently for each segment. 

The following theorem establishes a performance bound of the algorithm. Recall that L, denotes 
the cumulative loss of the algorithm while L; n is that of expert i. 


Theorem 3 Letn, N>1, 7 >0, 1 <m<n, and 6€ (0,1). For any sequence x1,...,X, € (0,1] 
of items, the cumulative loss L, of the randomized strategy defined in Figure 4 satisfies for all 
i=1,...,N, with probability at least 1 — 6, 


a m 1 mn nm, 1 2n 
Ln < Lin + — ln + + ln + +2m. 
n Wi,0 8 2 ò m 


In particular, choosing wio =1/N foralli=1,...,N, m = (16n/In(N/8))!/3 andy = \/8mInN/n, 


one has 








< 3 N n K 
In—- min Lin < n?n +4 ; 
me a A S O 

Proof We introduce an auxiliary quantity, the so-called hypothetical loss, defined as the loss the 
algorithm would suffer if it had been in the same state as the selected expert. This hypothetical 
loss does not depend on previous decisions of the algorithm. More precisely, the true loss of the 
algorithm at time instance t is C(J;,x; | $) and its hypothetic loss is €(1;,x; | S31). Introducing the 
notation 


tis = Lim | Sit) ) 
the hypothetical loss of the algorithm is just 


CL x | SJ) = IG Ea t Xi | SJ) = tis : 


Now it follows by a well-known result of randomized on-line prediction (see, e.g., Lemma 5.1 and 
Corollary 4.2 in Cesa-Bianchi and Lugosi, 2006) that the hypothetical loss of the sequential on-line 
bin packing algorithm satisfies simultaneously for all i= 1,...,N, with probability at least 1 — 6, 


= zi 1 1 ny n 1l 
< i l H ji } ‘ 
hushu: ne 3 5 a3) m (3) 


Wio 





where n' = | Ż | and the last m term comes from bounding the difference on the last, not necessarily 
complete segment. Now we may decompose the regret relative to expert i as follows: 


n n 
Ln —Lin = g = Een) F 2 Cnt -1n : 
t=1 t=1 
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SEQUENTIAL ON-LINE BIN PACKING ALGORITHM 


Parameters: Real number n > 0 and m € N*. 
Initialization: so = 1, s;9 = 1 and w;o > 0 are set arbitrarily for i= 1,...,N such that 
wio+w20+::-+wvo = 1. 
For each round t = 1,...,n, 
(a) If ((t— 1) mod m) = 0 then 
— calculate the updated probability distribution 
Dir= Wit-1 
I7 ON 

"O ENW 


fori = 1,...,N; 
— randomly select an expert J; € {1,...,N} according to the probability dis- 
tribution P; = (P1; -+ PN); 


otherwise, let J; = J;_1. 
(b) Follow the chosen expert: I; = fy, +. 
(c) The size of next item x, € (0, 1] is revealed. 


(d) The algorithm incurs loss 
CL x | Si-1) 


and each expert i incurs loss ¢(fiz,%r | Siz—1). The states of the experts and the 
algorithm are changed. 


(e) Update the weights 
Wig = Wig_1e Viele) 


for alli € {1,...,N}. 





Figure 4: Sequential on-line bin packing algorithm. 


The second term on the right-hand side is bounded using (3). To bound the first term, observe that 


by Lemma 2, 


n n n 
Ln - bra = X lh x | St—1) = Yeh x | SJ,1t-1) 
t=1 t=1 


n'—-1 m 


< m+ >. y (Cems seems | Ssm+t—1) = Leics Amir STgm41—1,sm+t—1 )) 


< m+2n 


where in the first inequality we bounded the difference on the last segment separately. 
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5. Constant-threshold Experts 


In this section we address the sequential on-line bin packing problem when the goal is to perform 
almost as well as the best in the class of all constant-threshold strategies. Recall that a constant- 
threshold strategy is parameterized by a number p € (0, 1] and it opens a new bin if and only if the 
remaining empty space in the bin is less than p. More precisely, if the state of the algorithm defined 
by expert with parameter p is s,);—1, then at time f the expert’s advice is I, <p}. To simplify 
notation, we will refer to each expert with its parameter, and, similarly to the previous section, fp, 
and sp, will denote the decision of expert p at time f, and its state after the decision, respectively. 

The difficulty in this setup is that there are uncountably many constant-threshold experts. The 
simplest possibility is to discretize the class. For example, by considering the set of constant- 
threshold experts with values of p in the set {1/N,2/N,...,1} and using the randomized algorithm 
described in the previous section, we immediately obtain that the cumulative regret of the algorithm, 
when compared to the best constant-threshold expert with p in this set is bounded by O(n?/ nl ) 
with high probability. It is natural to suspect that if N is large, the loss of the best discretized 
constant-threshold expert is not much larger than that corresponding to the best (unrestricted) value 
of p € (0, 1]. However, this is not true in general. The next lemma shows that any such discretization 
is doomed to failure, at least in the worst-case sense. We denote by Lp n the cumulative loss of the 
constant-threshold expert indexed by p € (0, 1]. 


Lemma 4 For all n such that n/4 is a positive integer and 1/2 <a <b < 1 there exists a sequence 
X1,- --,Xn of items such that 


n 
sup Lpn < inf Lpn— -+3 
pelab) Pelab) P” 4 


for any values of the initial states spo € |p, 1], p € (0,1].° 


Proof Given 1/2 < a< b < 1, we construct a sequence with the announced property. The first 
fourth of the sequence is defined by xı = 1 — a and x2 =--- = x,/4 = 1. If an expert asks for a new 
bin after the first item then it suffers no loss for t = 2,...,n/4, thus the cumulative loss up to time 
n/4 is bounded as Lp »/4 < 1. Note that any expert with parameter p > a is such, as the first item 
always fits the actual bin, as by the conditions of the lemma 1 —a < a < p < Sp, but then the empty 
space becomes so,p — (1 — a) < a < p, and so expert p opens a new bin. In case of an expert with 
parameter q < a, it depends on the initial state if the expert opens a new bin. If the actual bin is left 
open after the first item then the expert suffers loss Ly ,/4 = n/4— 1. In particular, if sy9 = 1 then 
after the first item expert q moves to state sq,ı = a and leaves the bin open. Thus, after time n/4 an 
expert either suffers loss at least n/4 — 1 (then the parameter of the expert is at most a), or it suffers 
loss at most 1, but then it is in the state s, ,/4 = 1. Now for the second forth of the sequence repeat 
the first one, that is, let X;/441 = 1 — a, Xn/442 = +++ = Xn/2 = 1. By the above argument we can see 
that if an expert with parameter q < a does not suffer large loss up to time n/4 then it starts with an 
empty bin and suffers a large loss in the second fourth of the segment. Thus, Ly n/2 > n/4— 1 for 
any q <a. On the other hand, for any expert p > a we have Ly n/2 < 2 and spn = 1. 





3. Note that for any expert p € (0, 1], spy € [p, 1] for all t > 1 regardless of the initial state, and so it is natural to restrict 
the initial state to [p, 1], as well. 
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After this point of time, let x,,/241 = 1 — b, xn/2+2 = b and repeat this pair of items n /4 times. 
After receiving x,/2; = 1—b, every expert with parameter p € (a, b] keeps the bin open and there- 
fore does not suffer any loss after receiving the next item. On the other hand, experts with parameter 
r > b close the bin, suffer loss b, and after x,,/2+2 = b is received, once again they close the bin and 
suffer loss 1 — b (here we used the fact that r > 1 — b since we assumed b > 1/2. Thus, between 
periods n/2 +1 and n, all experts with p € (a,b] suffer zero loss while experts with parameter r > b 
suffer loss n/4. 

Summarizing, for the sequence 


1—a, 1,1,...,1 ,1—a, 1,1,...,1 ,1—b,b,1—b,b...,1—b,b, 
= Sear c m 


n/4—1 periods n/4—1 periods n/2 periods 


we have 
<2 if p € (a,b| 
Lon§¥ >n/4—1 ifp<a 
>n/4 if p >b. 


Lemma 4 implies that one cannot expect a small regret with respect to all possible constant- 
threshold experts. This is true for any algorithm that, as the one proposed in the previous section, 
divides time into segments and on each segment chooses a constant-threshold expert and acts as 
the chosen expert during the following segment. Recall that this segmentation was necessary to 
make sure that the state of the algorithm gets synchronized with the chosen one. The statement is 
formalized below. 


Theorem 5 Consider any sequential on-line bin packing algorithm that divides time into segments 
of lengths m,,mz2,...,mx = 3 (where ye mi =n) such that, at the beginning of each segment mj, the 
algorithm chooses (in a possibly randomized way) a parameter p; € (0,1] and follows this expert 
during the segment, that is, l; = I5_, <p, for all t = DE mj+l,... È= mj. Then there exists 
a sequence of items xı, ...,Xn such that the loss of the algorithm satisfies, with probability at least 
1/2, 

n 


A 6k . 


Ege inf Lpnt 

pe(0,1] 
Proof We construct the sequence of items using the sequence shown in the proof of Lemma 4 as a 
building block. At time 1, divide the interval (0, 1] into 2k subintervals of equal length and choose 
one of these intervals uniformly at random. Denote the end points of this interval by (A;,B,]. Then 
during the first segment we define the items by 


1-A Nia ane 1,1,...,1 ASR eR aR opel epee 
< < — e amMaaaaaaaamamamamamaamass 
|mı/4|—1 periods |m; /4|—1 periods |m; /2] periods 


If mı is not divisible by 4, we may define the remaining (at most three) items arbitrarily. Then, 
according to Lemma 4, if the algorithm does not choose an expert to follow from the interval (A1, Bi] 


then its loss is larger by at least 4+ — 6 than that of any expert in (A1,B1]. (The extra 3 come from 
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the possibility that mı is not divisible by 4.) However, no matter how the algorithm chooses the 
expert to follow, the probability that it finds the correct subinterval is 1/(2k). 

To continue the construction, we now divide the interval (A;, B1] into 2k intervals of equal length 
and choose one at random, say (A2,B2]. We define the next items similarly to the first segment, but 
now we make sure that the optimal constant-threshold expert falls in the interval (Az, Bo], that is, 
the items of the second segment are defined by 


1—42, 1,1,...,1 ,1—A, 1,1,...,1 PS Bp BAB orl Bs Bee 
=r’ =a e MM 


|m2/4|—1 periods |m2/4|—1 periods |m2/2] periods 


As before, if m is not divisible by 4, we may define the remaining (at most three) items arbitrarily. 
Once again, the excess loss of the algorithm, when compared to the best constant-threshold expert, 
is at least “2 — 6 with probability 1/ (2k). 

We may continue the same randomized construction of the item sizes in the same manner, 
always dividing the previously chosen interval into 2k equal pieces, choosing one at random, and 
constructing the item sequence so that experts in the chosen interval are significantly better than any 
other expert. 

By the union bound, the probability that the forecaster never chooses the correct interval is at 
least 1/2, so with probability at least 1/2, 


a- inf Lpa 
pE€(0,1] j 


iM» 
PaaS 
>|3 

l 
nN 
Nae 
ALS 
l 
nN 
Pl 


as desired. E 


The theorem above shows that if one uses a segmentation for synchronization purposes, one 
cannot expect nontrivial regret bounds that hold uniformly over all possible sequences of items and 
for all constant-threshold experts, unless the number of segments is proportional to n. It seems 
unlikely that without such synchronization one may achieve o(n) regret. Unfortunately, we do not 
have a formal proof for arbitrary algorithms (that do not divide time into segments). 

However, one may still obtain meaningful regret bounds that depend on the data. We derive 
such a bound next. We also show that under some natural restrictions on the item sizes, this result 
allows us to derive regret bounds that hold uniformly over all constant-threshold experts. 

In order to understand the structure of the problem of constant-threshold experts, it is important 
to observe that on any sequence of n items, experts can exhibit only a finite number of different be- 
haviors. In a sense, the “effective” number of experts is not too large and this fact may be exploited 
by an algorithm. 

For t = 1,...,n we call two experts t-indistinguishable (with respect to the sequence of items 
X1,..-,%--1) if their decision sequences are identical up to time ¢ (note that any two experts are 
1-indistinguishable, as all experts p start with a decision f,; = 0). This property defines a nat- 
ural partitioning of the class of experts into maximal f-indistinguishable sets, where any two ex- 
perts that belong to the same set are f-indistinguishable, and experts from different sets are not 
t-indistinguishable. Obviously, there are no more than 2' maximal f-indistinguishable sets. This 
bound, although finite, is still too large to be useful. However, it turns out that the number of 
maximal t-indistinguishable sets only grows at most quadratically with t. 
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The first step in proving this fact is the next lemma that shows that the maximal t-indistinguishable 
expert sets are intervals. 


Lemma 6 Let 1 > p >r >Q be such that expert p and expert r are t-indistinguishable. Then 
for any p > q >r expert q is t-indistinguishable from both experts p and r. Thus, the maximal 
t-indistinguishable expert sets form subintervals of (0, 1]. 


Proof By the assumption of the lemma the decision sequences of experts p and r coincide, that is, 


Jou = Fru and Spu = Sru 


for all u =1,2,...,t. Let t1,t2,... denote the time instances when expert p (or expert r) assigns the 
next item to the next empty bin (ie., fpu = 1 for u = t1,t2,...). If expert q also decides 1 at time tg 
for some k, then it will decide 0 for t = tg +1,...,441 — 1 since so does expert p and p > q, and will 
decide 1 at time t,41 as q > r. Thus the decision sequence of expert q coincides with that of expert 
p and r for time instances tg +1,...,t,41 in this case. Since all experts start with the empty bin at 
time 0, the statement of the lemma follows by induction. a 


Based on the lemma we can identify the t-indistinguishable sets by their end points. Let Q, = 
{qit,---,9N,t} denote the set of the end points after receiving t — 1 items, where N; = |Q] is the 
number of maximal t-indistinguishable sets, and go, = 0 < qi4 < G24 < + < qn, = 1. Then the 
t-indistinguishable sets are (qx—11,9x,] for k = 1,...,N;. The next result shows that the number of 
maximal t-indistinguishable sets cannot grow too fast. 


Lemma 7 The number of the maximal t-indistinguishable sets is at most quadratic in the number 
of the items t. More precisely, N, < 1+t(t—1)/2 forany1<t<n. 


Proof The proof is by induction. First, N; = 1 (and Q = {1}) since the first decision of each 
expert is 1. Now assume that N, < 1 +t(t — 1)/2 for some 1 <t < n— 1. When the next item x, 
arrives, an expert p with state s decides 1 in the next step if and only if 0 < s—x; < p. There- 
fore, as each expert belonging to the same indistinguishable set has the same state, the k-th max- 
imal (t — 1)-indistinguishable interval with state s is split into two subintervals if and only if 
dk—-1t—-1 < S— Xt < qxr—1 (experts in this interval with parameters larger than s — x, will form one 
subset, and the ones with parameter at most s — x; will form the other one). As the number of 
possible states after ¢ decisions (the number of different possible values of s — x+) is at most t by 
Lemma 1, it follows that at most f intervals can be split, and so N41 < N; +t < 1 +t(t+1)/2, where 
the second inequality holds by the induction hypothesis. a 


Lemma 7 shows that the “effective” number of constant-threshold experts is not too large. This 
fact makes it possible to apply our earlier algorithm for the case of finite expert classes with reason- 
able computational complexity. However, note that the number of “distinguishable” experts, that is, 
the number of the maximal indistinguishable sets, constantly grows with time, and each indistin- 
guishable set contains a continuum number of experts. Therefore we need to redefine the algorithm 
carefully. This may be done by a two-level random choice of the experts: first we choose an indis- 
tinguishable expert set, then we pick one expert from this set randomly. The resulting algorithm is 
given in Figure 5. 
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SEQUENTIAL ON-LINE BIN PACKING ALGORITHM WITH CONSTANT-THRESHOLD 
EXPERTS 


Parameters: n > 0 and m € N+. 
Initialization: wo, = 1, Ni = 1, Q = {1}, 51.9 = 1 and 59 = 1. 


For each round t = 1,...,n, 








(a) If ((t— 1) mod m) = 0 then 
— fori=1,...,N;, compute the probabilities 
Wit-1 


Pt=5 N 
Lii Wjt-1 


— randomly select an interval J, € {1,...,N,} according to the probability 
distribution p; = (P1 t,- <, PN); 


— choose an expert p; uniformly from the interval (¢j,—17,q,,.1]3 
otherwise, let p; = p;—1. 
(b) Follow the decision of expert p;: I; = fp, 1- 
(c) x; € (0, 1], the size of the next item is revealed. 


(d) The algorithm incurs loss ¢(J;,x; | 5;-1) and each expert p € (0, 1] incurs loss 
£L(fpt,%t | Sp¢—1), Where p € (0,1). 


(e) Compute the state 5; of the algorithm by (1), and calculate the auxiliary weights 
and states of the expert sets for all i= 1,...,N; by 


Wir = Wi 1 pe ilsi) 


’ 


Sit = Fial =) + 1 = fit) (Sit — Wts,,>x,)%1)- 


(f) Update the end points of the intervals: 


N, 
Qai = QUJ Si: dizit < Sie < dia} 
i=1 
and Ni+1 = |Q+1]. 
(g) Assign the new states and weights to the (t + 1)-indistinguishable sets 


_ ~ 4Git+l — Gi-1,t4+1 
Siri = Sj and wina = Wj 
qjt—4lj-1,t 


for alli=1,...,Nj41 and j = 1,...,N; such that gj-14 < qit+1 < Gje- 





Figure 5: Sequential on-line bin packing algorithm with constant-threshold experts. 
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Up to step (e) the algorithm is essentially the same as in the case of finitely many experts. 
The two-level random choice of the expert is performed in step (a). In step (f) we update the t- 
indistinguishable sets, and usually introduce new indistinguishable expert sets. Because of these 
new expert sets, the update of the weights w;; and the states s;, are performed in two steps, (e) and 
(g), where the actual update is made in step (e), and reordering of these quantities according to the 
new indistinguishable sets is performed in step (g) together with the introduction of the weights and 
states for the newly formed expert sets. (Note that in step (g) the factor (gi¢+1 — gi-1r41)/(jt— 
qj-1,) is the proportion of the lengths of the indistinguishable intervals expert gj ;+, belongs to at 
times t+ 1 and t.) 

The performance and complexity of the algorithm is given in the next theorem. 


Theorem 8 Let n> 1, y > 0, 1<m<n, and 8€ (0,1). For any sequence x1,...,Xn € (0,1] of 
items, the cumulative loss L, of the randomized strategy defined above satisfies for all p € (0, 1], 
with probability at least 1 — 6, 


2 1 io 
L x ee a he he 
2 Ss m 





where lpn is the length of the maximal n-indistinguishable interval that contains p. Moreover, the 
algorithm can be implemented with time complexity O(n?) and space complexity O(n?). 


Remark 9 (i) By choosing m ~ n!’ andy ~n", the regret bound is of the order ofn?’ In(1/Ipn)- 
Note that the constant In(1 /lp n) reflects the difficulty of the problem (similarly to, for example, the 
notion of margin in classification, lpn measures the freedom in choosing an optimal decision bound- 
ary, that is, an optimal threshold). If the indistinguishable interval containing the optimal experts is 
small, then the problem is hard (and the corresponding penalty term in the bound is large). On the 
other hand, as N, < 1+n(n—1)/2, if the classes of indistinguishable experts are more or less of 
uniform size, then the corresponding term in the bound is of the order of Inn. We show below that 
this is always the case if there is a certain randomness in the item sizes. 

(ii) The way of splitting the weight between new maximal indistinguishable classes in step (g) 
could be modified in many different ways. For example, instead of assigning weights proportionally 
to the length of the new intervals, one could simply give half of the weight to both new classes. 
In this case, instead of the term \n(1/Ip:») for the optimal expert p*, we would get in the bound 
the number of splits performed until reaching the optimal maximal n-indistinguishable class. The 
hardness of the problem comes from the fact that the partitioning of the experts into maximal indis- 
tinguishable classes is not known in advance. If we knew it, we could just simply apply the algorithm 
of Theorem 3 to the resulting N, experts (as in Theorem 4.1 of Cesa-Bianchi and Lugosi, 2006) to 
obtain a uniformly good bound over all constant-threshold experts. 


Proof It is easy to see that the two-level choice of the expert p; ensures that the algorithm is the 
same as for the finite expert class with the experts defined by Q, with initial weights w;,o = lqin,n = 
din — qi-1,n for the n-indistinguishable expert class containing qin- Thus, Theorem 3 can be used to 
bound the regret, where the number of experts is N;. 

For the second part note that the algorithm has to store the states, the intervals, the weights and 
the probabilities, each on the order of O(n”) based on Lemma 7. Concerning time complexity, the 
algorithm has to update the weights and states in each round (requiring O(n?) computations per 
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round), and has to compute the probabilities once in every m step, which requires O(n? /m) compu- 
tations. Thus the time complexity of the algorithm is O(n°). E 


Next we use Theorem 8 to show that, for many natural sequences of items, the algorithm above 
guarantees a small regret uniformly for all constant-threshold experts. In particular, we show that 
if item sizes are jittered by random noise, then the algorithm shown above has a small regret 
with respect to all constant-threshold experts (it is well-known that, for general systems, intro- 
ducing such random perturbations often reduces the sensitivity, and hence results in a more uni- 
form performance, for different values of the input). To this end, we simply need to show that 
n-indistinguishable intervals cannot be too short. We consider a simple model when the item sizes 
are noisy versions of an arbitrary fixed sequence. For simplicity we assume that the noise is uni- 
formly distributed but the result remains true under more general circumstances. For illustration 
purposes the simplified model is sufficient. 


Theorem 10 Let y\,...,yn € (0,1] be arbitrary and define the item sizes by 





Yyt+0, ify, +0; € (0,1] 
Xt = 1 if yı +0; > 1 
0 ify, +O, <0 
where ©1,...,O6n are independent random variables, uniformly distributed on the interval |—€,€] 


for some £ > 0. If the algorithm of Figure 5 is used with parameters m = (16n/1n(n>/e))!/3 and 


n =  8mln(n>/e) /n, then with probability at least 1 — &— 1 / (4n), one has 





F 3 2/31,1/3 n 2n oe 
In- min Lpn <= l H4 : 4 
An ag gs (3 a) ® 


Proof The result follows directly from Theorem 8 if we show that the length of the shortest maximal 
n-indistinguishable interval is at most €/n> with probability at least 1 — 1/(4n) (with respect to the 
distribution of the random noise). A very crude bounding suffices to show this. Simply recall from 
the proof of Lemma 7 that, at time t, a maximal t-indistinguishable interval (p,q) is split if and only 
if x € (s+p,s+q) where s denotes the state of a corresponding constant-threshold expert. Note that 
(s+p,s+q) C (0,1), since x, = 0 or x, = 1 cannot split any maximal t-indistinguishable interval, 
but any such interval can be split by an appropriately chosen x;. At time t there are at most 17/2 
different maximal f-indistinguishable intervals and at most ¢ different states, so by the union bound, 
the probability that there exists a maximal f-indistinguishable interval of length at most €/n° that is 
split at time ¢ is bounded by f3 /2 times the probability that x, € (s + p,s +q) for a fixed interval with 
q—p<eé/n?. Because of the assumption on how x; is generated, the latter probability is bounded by 
(q—p)/(2€) < 1/(2n?) (the truncation of x; at 0 and 1 has no effect, because (s + p,s +q) C (0,1)). 
Hence, the probability that there exists a maximal t-indistinguishable interval of length at most €/n> 
that is split at time ¢ is no more than t?/2-1/(2n>) < 1/(4n’). Thus, using the union bound again, 
the probability that during the n rounds of the game there exists any maximal t-indistinguishable 
interval of length at most €/n> that is split is at most 1/(47), and therefore, with probability at least 
1 —1/(4n), all maximal n-indistinguishable intervals have length at least ¢/n°, as desired. E 


106 


ON-LINE SEQUENTIAL BIN PACKING 


Remark 11 (i) The theorem above shows that, for example, if € = Q(n~) for some a > 0 (i.e., if 
the noise level is not too small), then the regret with respect to the best constant-threshold expert is 
O(n2/3 In'/3 n). 

(ii) A similar model can be obtained, if, instead of having perturbed item sizes, the experts 
observe the free space in their bins with some noise. Thus, instead of Sp,—1, expert p observes 
Spt—1 + Op; truncated to the interval [0,1], and makes decision fp, based on this value. As in 
the case of Theorem 10, we assume that the noise is independent over time, that is, the random 
ensembles {6p,} pe(0,1] are independent for all t. If each component is identical, that is, Op; = O: 
for all p € (0,1], then essentially the same argument applies as in the previous theorem, and so 
(4) holds if the sequence ©1,...,©On satisfies the assumptions of Theorem 10. On the other hand, 
if the components of the vectors are also independent, then the problem becomes more difficult, as 
the t-indistinguishable classes may not be disjoint intervals anymore. An intermediate assumption 
on the noise that still guarantees that (4) holds for this scenario is that Opt = Oq, if p and q 
are t-indistinguishable. Then the same argument as in Theorem 10 works with the only difference 
(omitting the effects of truncation to {0,1]) that here we have to estimate the probability that x; € 
(s+ p+6r9,8+ 9+) for a fixed x; instead of estimating the probability that x; € (s + p,s +q) 
with a randomized x,. However, it is easy to see that the same bound holds in both cases. 


Finally, we present a simple example that reveals that the loss of the best expert can be arbitrarily 
far from that of the optimal sequential off-line packing. 


Example 3 Let the sequence of items be 


( €, 1-€,¢, l-e,...,€, l-e€,e,1,1,...,1), 
a MM 


2k k 


where the number of items isn = 3k+1 and 0 < € < 1/2. An optimal sequential off-line packing 
is achieved if we drop any of the € terms; then the total loss is £. In contrast to this, the loss of any 
constant-threshold expert is 1 — € + k independently of the choice of the parameter p. Namely, if 
p <1 -€ then the loss is 0 for the first 2k items, but after the algorithm is stuck and suffers k +1—€ 
loss. If p > 1 —8, then the loss is k for the first 2k items and after that 1 — £ for the rest of the 
sequence. 


6. Conclusions 


In this paper we provide an extension of the classical bin packing problems to an on-line sequential 
scenario. In this setting items are received one by one, and before the size of the next item is 
revealed, the decision maker needs to decide whether the next item is packed in the currently open 
bin or the bin is closed and a new bin is opened. If the new item does not fit, it is lost. If a bin is 
closed, the remaining free space in the bin accounts for a loss. The goal of the decision maker is to 
minimize the loss accumulated over n periods. 

We give an algorithm that has a cumulative loss not much larger than any finite set of reference 
algorithms. We also study in detail the case when the class of reference strategies contains all 
constant-threshold experts. We prove some negative results, showing that it is hard to compete with 
the overall best constant-threshold expert if no assumption is imposed on the item sizes. We also 
derive data-dependent regret bounds and show that under some mild assumptions on the data the 
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cumulative loss can be made not much larger than that of any strategy that uses a fixed threshold 
at each step to decide whether a new bin is opened. An interesting aspect of the problem is that 
the loss function has an (unbounded) memory. The presented solutions rely on the fact that one 
can “synchronize” the loss function in the sense that no matter in what state an algorithm is started, 
its loss may change only by a small additive constant. The result for constant-threshold experts is 
obtained by a covering of the uncountable set of constant-threshold experts such that the cardinality 
of the chosen finite set of experts grows only quadratically with the sequence length. The approach 
in the paper can easily be extended to any control problem where the loss function has such a 
synchronizable property. 
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